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A lossy compression algorithm for binary redundant memoryless sources is presented. The 
proposed scheme is based on sparse graph codes. By introducing a nonlinear function, redundant 
memoryless sequences can be compressed. We propose a linear complexity compressor based 
on the extended belief propagation, into which an inertia term is heuristically introduced, and 
show that it has near-optimal performance for moderate block lengths. 
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Channel coding can be considered as a dual problem 
of lossy source coding. 1,2 ) Recent research on error cor- 
recting codes and lossy source coding has shown that 
the statistical mechanical approach can be used to ex- 
plain such problems. 3 ) Lossy compression for memoryless 
sources has been widely investigated since Matsunaga 
and Yamamoto showed that it is possible to attain the 
rate-distortion bound asymptotically using low-density 
parity-check (LDPC) codes 4 )D The upper 5 ) and lower 6 ' 7 ) 
bounds on its rate-distortion performance of low-density 
generator-matrix (LDGM) codes for lossy compression 
with a given check degree are evaluated. Some other 
lossy compression schemes that have asymptotic opti- 
mally have been proposed so far 8 ~ 19 'D 

Efficient compressors, on the other hand, are still in 
the stage of development. Some efficient encoding al- 
gorithms, which have near optimal performance, have 
been proposed, e.g., the nested binary linear codes, 20 ) 
the inertia-term-introduced belief propagation 10, 21 )C 
the survey-propagation-based message passing algo- 
rithm 22 ) C the bit-flipping-based algorithm, 14 ) the ex- 
haustive search of small words into what an original mes- 
sage is divided 23 ) C the linear-programming-based algo- 
rithm, 24 ) and the Markov-Chain Monte-Carlo(MCMC)- 
based algorithm. 25 ) 

For redundant memoryless sources, some low complex- 
ity compressors, e.g., the near-linear complexity com- 
pressor based on the exhaustive search of small words, 23 ) 
the quadratic complexity compressor based on the bit- 
flipping-based algorithm, 14 ) and the MCMC-based com- 
pressor 25 ) whose complexity is independent of the se- 
quence length, have been proposed so far. One of other 
approaches to obtain near-linear complexity compres- 
sors for redundant sources is to apply the inertia-term- 
introduced belief propagation. Hosaka and Kabashima 
have proposed an algorithm for redundant sources, whose 
complexity is 0(A 2 ). 10 ) In this study, we propose a lin- 
ear complexity lossy compression algorithm based on 
an inertia-term-introduced belief propagation by using 
a nonlinear function and a sparse matrix such as low- 
density generator matrix (LDGM) codes for binary re- 
dundant memoryless sources. This proposed algorithm 
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can be regarded as the perceptron-based one 10 ) whose 
edges are extremely deleted to have a finite connectivity 
and has asymptotic optimality under some constraints. 
We also show that it has near optimal performance for 
moderate block lengths. 

Let us first provide the concepts of the rate-distortion 
theory 1 ) and some notations. Let x be a discrete ran- 
dom variable with an alphabet X . A source alphabet, a 
codeword alphabet, and a reproduced alphabet are X , S 
and X, respectively. The compressor T encodes the M 
bit source sequence x = t {x\,--- ,x M ) e X M into the 
N(< M) bit codeword $ = %, • • • ,£ N ) = F{x) 6 S N . 
The decompressor Q generates the M bit reproduced se- 
quence x = t (xi, ■ ■ • , xm ) = G{£) G X M from the code- 
word £. The code rate then becomes R = N/M(< 1). 

A distortion measure is a map d : X x X — > [0, oo). A 
distortion between the sequences x = t (xi,--- ,Xm) G 
X M and x — t (xi,--- ,Xm) S X m is measured 
by the averaged single-letter distortion as d(x,x) = 
JS Sfe=i d(xk,Xk). A rate distortion pair (R, D) is said to 
be achievable if there exists a sequence of rate distortion 
codes (J 7 , Q) with K x [d(x, x)] < D in the limit M — > oo. 
The rate distortion function R{D) is the infimum of the 
rate R such that (R, D) is in the rate distortion region 
of the source for a given distortion D. 

We hereafter consider the binary alphabets X = S = 
X = { — 1 , 1 } and a redundant binary memoryless source 
whose distribution is given by fx(X = 1) = 1 — p, /i(X = 
— 1) = p. The parameter p is a source bias. We use the 
Hamming distortion 



d{x, x) = 



0, if x = x, 

1, if X 7^ x, 



(1) 



as a distortion measure. The rate-distortion function 
of a Bernoulli(l/2) i.i.d. source then becomes R(D) = 
h.2(p) — h,2{D), where h% denotes the binary entropy 
function which is defined by h,2(x) = —x\og 2 { x ) ~ (1 ~ 
*)log 2 (l-a;). 

Using a parameter w = (101,102) £ (N\{0}) 2 , we first 
introduce the nonlinear function g : N — > { — 1,1} defined 
as 



9w(z) = 9{ Wl ,w 2 )(z) = 



1 if W\ < \z\ < W2 
•1 otherwise 



(2) 
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where the operator * A denotes the transpose of A. For a 
vector, this function acts componentwise. We here con- 
sider the following decompressor: 



<?(£)=. 9™ (-40, 



(3) 



where A — (a^i) £ { — 1,0,1} denotes a sparse regu- 

lar matrix whose row weight, i.e., the number of nonzero 
elements, is C. Each nonzero element of the sparse ma- 
trix A takes ±1 with equiprobability. The function g w {z) 
is adjusted so that the bias of the reproduced sequence is 
close to that of the original message as much as possible. 
More complex functions can be considered as the func- 
tion g w (z). As will be discussed later, the complexity of 
our proposed algorithm is O(N) but it is proportional to 
2 C . So a small C is preferable; therefore, we here consider 
a simple nonlinear function that can easily be adjusted. 
The compressor is then defined by 



F(x) = argmin d(x,Q(s)). 
se{-i,i} N 



(4) 



This compressor is identical to J~{x) = 
argmax s£ r 1 -nw p(s;x), which is the maximization of 
the following distribution 



p(s;x) = 



1 



Z((3) 



D -/3Md(s;x) _ 



M 



,—0Gk(s;xk) 



Z(fi) 



> (5) 



fe=i 



where G k (s;x k ) = |(1 - x k X k ), x k = g w (J2iec(k) a kiSi) 
and C(k) = {i\a k i ^ 0} with the parameter j3 > 0. Here, 
Z(/3) denotes an normalization constant of p(s; x), which 
is defined by Z(/3) = £ a e -/3Md(x,<?(s)). The function 
G k {s;x k ) represents a distortion with respect to the k 
th bit. 

We here consider a large row weight limit, which is a 
case where C = N holds, to allows us to infer compres- 
sion performance. In this limit our scheme can be re- 
garded as the perceptron-based code. 9,10,17,18,26 -' When 
W2 > N, these are equivalent to each other. To make 
the parameters w\ and w 2 be of order unity, we in- 
troduce a constant into the decompressor as G(£) = 
g w (N~ 1 / 2 A£,). The achievable distortion D is evaluated 
as D = lim^oo d[0f(f3)]/dfi via the free energy den- 
sity /(/3) = {-PM)~ 1 'E. A , a $a.Z{P)), where E denotes an 
expectation operator. 

Applying the so called replica method, the free energy 
density can be evaluated as f({3) — — /3 _1 (pln{e _/3 + (l — 
e-' 3 )K w } + (1 -p) \n{e-' 3 + (1 - e~^{l - K w )} + i?ln2) 
within the replica symmetric treatment, where K w = 
/(zeRI ( z )=-i> (2ir)~ e ~ z ^ 2 dz. The parameter K w is 
identical to the expectation value K z [g w (z) = — 1] with a 
random variable z which obeys the standard normal dis- 
tribution A/"(0, 1), which originates from the distribution 
of each element of A _1 / 2 ,4£. It can be considered that 
the compression performance is given using the distribu- 
tion of N~ 1 / 2 A^ and g w in this scheme. 

The entropy density of p(s; x) is then obtained as 
s(j3) = j3{d[(if{j3)]/d(i - /). The entropy density s{(i) 
must be non-negative owing to the definition of p(s;x); 
however it takes negative values in the large /3 region. 
We therefore evaluate the achievable distortion D at j3 c 



which gives a zero entropy density (s(/3 c ) = 0), so this 
analysis is equivalent to the Krauth-Mezard approach 
which is a kind of one-step replica symmetric breaking 
(RSB) treatment. 27 ' 

Using the zero-entropy-density criterion, minimizing 
the achievable distortion D = lim^^ d{(3 f ((3)] / d (3 with 
respect to w, one can obtain R(D) = h 2 (p) — h 2 (D), 
which is identical to the rate-distortion function. From 
these two conditions, i.e., the zero entropy density and 
the minimization of the achievable distortion, the follow- 
ing relationships are obtained: 

(0) 



p-/3 - 



K w = 



D 



1-2D 



(7) 



The definition of the compressor means that it has 
exponential complexity. We then utilize a suboptimal al- 
gorithm based on message passing to construct the com- 
pressor. 21 ' Instead of the maximization of p(s; x) we use 
a symbol MAP encoding scheme, which is maximization 
of marginal distribution, 



6- 



argmax 
s,e{-M} 



E 



p(s;x) 



(8) 



sWej-i.i}' 



To evaluate the marginal distribution we apply the be- 
lief propagation. Since Q(— s) = Q{s) holds for any s, 
the expectation value of Sj becomes zero. To avoid this 
uncertainty we heuristically introduce an inertia term as 
a prior, which gives the following inertia-term-introduced 
belief propagation: 21 ' 



Pki(Si 



s i'ec(k)\i 



■0G k (s; Xk ) -Q p ^ M; (9) 
i'eC(k)\i 



plk 1 {s i )=a ik r t i (s i ) Yl Pk'ii s i)- 

k'eM(i)\k 

A pseudo marginal can be evaluated as 



ql +1 {s i ) = a i r t i (s i ) ]J Pki^i), 

keM(i) 



(10) 



(11) 



where on k and on denote normalization constants and 
M{i) = {k\a k i ^ V/z}. Here, the function rj(si) is an 
introduced prior as the inertia term and the superscript 
t represents an iteration step. We here consider that Sj 
is binary, so we can safely put p\ k {si) = j>(1 +rm k (t)si), 
P k i( s i) = 5(1 + ™ik{i)si), q\{si) = |(1 + m,i(t)si). We 
here define a prior as r*(sj) = e Si tanh [T m <(*)], where the 
parameter 7 (0 < 7 < 1) denotes the amplitude of the 
inertia term, which is heuristically chosen. When 7 = 
(r\{si) = 1), the inertia-term- introduced belief propa- 
gation recovers the conventional belief propagation. It 
should be noted that the performance does not strongly 
depend on the detailed shape of the function, if it is 
an increasing function. It has not yet been investigated 
how the inertia term works in detail so far; however it is 
known that the inertia term chooses a single peak in the 
calculation of the pseudo marginal. 

We calculate the equations of the belief propagation, 
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which gives 

m ki (t) = 

m ik (t + l) = 



akiXk ( tanh — jVki(t) 



(12) 



rrn(t + l) = 



1 + x k ( tanh- \u k i{t) 

tanh 2_. tanh" TOfc'i(i) 

\k'eM(i)\k 



+ tanh _1 7^i(*) , (13) 
tanh \ tanh - rhkiif) 

\k'£M(i) 

+ tanh~ 1 7«i-i(£) I, (14) 



where 



Unit) = y^ u w \ y^ 

s i'er.(k)\i \i'eC(k 



Q>ki' Si' 



V ki (t) = 



>< n 

i'eC(k)\i 
s i'er.(k)\i 



(k)\i 

1 + mi'k(t)si 



/ Oiki'^V 

\i'£C(k)\i 



(15) 



U 1 - 



rrii'k(t)si 



u w (x) = 



v w (x) 



(16) 

i'eC(k)\i 
I(— U>2 < X < — W\) + I(wi < X < w 2 ) 
-I(a; < —Wi) - I(w 2 < x) 
-I(-wi < x < wi), (17) 

I(x = — w 2 ) — I(# = —Wi) 
+I(x = w 1 )-I(x = w 2 ), (18) 



and I(V) denotes an indicator function that takes 1 if 
the proposition V is true, and otherwise. After t m 
iterations, the i th bit of the codeword can be ob- 
tained as £j = sgn [mj(t m )] using the mean of the 
pseudo marginal rrii(t m ). To derive these iterative equa- 
tions, we use the identity x k = Uw(J2i>ec(k)\i a ki' s i') + 
akiSi «tu(Z)i' e £(fc)\i a fei'Si') J which holds for any i G 
C(k). 

The computational cost of the terms Uki it) and Vki it) 
is 0(2 C ), which depends only on the row weights C, 
namely, it is 0(1) with respect to N . The complexity 
of this algorithm is therefore OiN) when the number of 
iterations t m is fixed. 

Utilizing eqs. (6) and (7) which are obtained in the 
large-row-weight-limit analysis, we can approximately 
set all parameters C, w\, w2, and ft of our scheme with 
finite row weights except 7. 

We first consider a setting of the parameter (3. Using 
eq. (6) and the rate-distortion function, we set f3 as f3 — 



Shannon bound 

Time sharing bound 

Proposed algorithm — B— 
Hosaka's algorithm - --©--- 




Shannon bound 

Time sharing bound 

Proposed algorithm — B- 
Hosaka's algorithm - --©-- 




Fig. 1. Empirical compression performance against the code rate 
R for typical source bias p. The proposed algorithm (squares) and 
Hosaka's algorithm (circles) are shown. The length of the orginal 
sequences is N = 420, and all the measurements are averaged 
over 10 runs. The parameter 7 is chosen within {0.2, 0.3, 0.4, 0.5}. 
The row weight C is chosen within C < 8 (C max = 8). Top: 
p 6 {0.6, 0.8}. Bottom: p G {0.7, 0.9}. 



P c ip, R) for the given the source bias p and the code rate 

R, where P c (p,R) = InQh^ 1 (h 2 (p) - R)]' 1 - 1). Here, 
/I2 denotes the inverse function of the binary entropy 
function. 

We next consider a setting of the parameters C, w\, 
and w2. Each element of the vector As is the summation 
of C Bernoulli random variables 1 — 2Ber(0.5), where 
s e {—1,1}^ denotes a candidate of a codeword. 
This is a similar situation to the row weight limit. 
To keep the row weight finite, we restrict the row 
weight as C < C max . Using eq. (7) and the rate- 
distortion function, we set (C, u>i,W2) as (C, u>i,W2) = 
&rgmm {c , w ^ w , 2)&DiCm ^ } \KiC',w[,w' 2 ) - K(p,R)\, 



where K[C, 101,102) = X), 



6{0, 



,-CrC 






,C}:g w (C-2n)=-l 

K(p,R) = [ P - h- l ih 2 ip) - R)}/[1 - 2h- 1 ih 2 ip) - R)}, 
and DiC max ) = {(C,wi,w 2 )\2 < C < C max ,0 < Wi < 
C — l,wi < w 2 < C + 1} for the given p and R. Note 
that the parameter that gives second smallest value 
might provide better performance. 

Lastly, 7 is determined by trial and error. In this study, 
we choose 7 only within {0.2, 0.3, 0.4, 0.5}. 

The empirical compression performance is shown in 
Fig. 1. In this figure, the distortion averaged over 10 runs 
is plotted as a function of the code rate R for the source 
bias p £ {0.6,0.7,0.8,0.9}. The length of codewords is 
fexed at N = 420, and the length of original sequence is 
adjusted. We here choose C max = 8 and p > 0.5. In this 
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figure, the time sharing bound is also shown. The time 
sharing bound is given by R(D) = (1 — -^)/i 2 (p), which 
denotes the compression performance achieved by the 
time sharing scheme of a lossless coding (R(0) = h^v)) 
and a trivial encoding that always outputs an all-one 
vector for any input (R(p) = 0). It can be confirmed 
that the proposed linear complexity compressor (with 
O(N) complexity) has slightly better performance than 
Hosaka's algorithm (with 0{N 2 ) complexity). 

We observed a not-so-good performance for the small- 
p region that the source bias is less than about 0.2. In this 
region min|7^(C", w' 1 ,w' 2 ) — K(p, R)\ is not much smaller 
than that of the large-p region. When we compress the 
original sequence x of which bias is p < 0.5, we can 
first flip it as —x and then compress. The information 
for determining whether the sequence flips requires one 
bit. To reduce mm\K(C, w' 1 , w 2 ) — K(p, R)\, it might be 
helpful to introduce more complex nonlinear functions. 

In this study, we have proposed a scheme using a non- 
linear function and a sparse matrix, and as well as a 
linear complexity message passing compressor based on 
the inertia-term-introduced belief propagation. The pro- 
posed method can treat redundant memoryless sources 
and has near-optimal compression performance for mod- 
erate block lengths. The adjustment of the column weight 
distribution of the sparse matrix might enable us to im- 
prove the compression performance. The analysis of this 
scheme with finite row weights is one of our future stud- 
ies. 
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